41 research outputs found
Node Embedding over Temporal Graphs
In this work, we present a method for node embedding in temporal graphs. We
propose an algorithm that learns the evolution of a temporal graph's nodes and
edges over time and incorporates this dynamics in a temporal node embedding
framework for different graph prediction tasks. We present a joint loss
function that creates a temporal embedding of a node by learning to combine its
historical temporal embeddings, such that it optimizes per given task (e.g.,
link prediction). The algorithm is initialized using static node embeddings,
which are then aligned over the representations of a node at different time
points, and eventually adapted for the given task in a joint optimization. We
evaluate the effectiveness of our approach over a variety of temporal graphs
for the two fundamental tasks of temporal link prediction and multi-label node
classification, comparing to competitive baselines and algorithmic
alternatives. Our algorithm shows performance improvements across many of the
datasets and baselines and is found particularly effective for graphs that are
less cohesive, with a lower clustering coefficient
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation
The ability to collect a large dataset of human preferences from
text-to-image users is usually limited to companies, making such datasets
inaccessible to the public. To address this issue, we create a web app that
enables text-to-image users to generate images and specify their preferences.
Using this web app we build Pick-a-Pic, a large, open dataset of text-to-image
prompts and real users' preferences over generated images. We leverage this
dataset to train a CLIP-based scoring function, PickScore, which exhibits
superhuman performance on the task of predicting human preferences. Then, we
test PickScore's ability to perform model evaluation and observe that it
correlates better with human rankings than other automatic evaluation metrics.
Therefore, we recommend using PickScore for evaluating future text-to-image
generation models, and using Pick-a-Pic prompts as a more relevant dataset than
MS-COCO. Finally, we demonstrate how PickScore can enhance existing
text-to-image models via ranking
AudioGen: Textually Guided Audio Generation
We tackle the problem of generating audio samples conditioned on descriptive
text captions. In this work, we propose AaudioGen, an auto-regressive
generative model that generates audio samples conditioned on text inputs.
AudioGen operates on a learnt discrete audio representation. The task of
text-to-audio generation poses multiple challenges. Due to the way audio
travels through a medium, differentiating ``objects'' can be a difficult task
(e.g., separating multiple people simultaneously speaking). This is further
complicated by real-world recording conditions (e.g., background noise,
reverberation, etc.). Scarce text annotations impose another constraint,
limiting the ability to scale models. Finally, modeling high-fidelity audio
requires encoding audio at high sampling rate, leading to extremely long
sequences. To alleviate the aforementioned challenges we propose an
augmentation technique that mixes different audio samples, driving the model to
internally learn to separate multiple sources. We curated 10 datasets
containing different types of audio and text annotations to handle the scarcity
of text-audio data points. For faster inference, we explore the use of
multi-stream modeling, allowing the use of shorter sequences while maintaining
a similar bitrate and perceptual quality. We apply classifier-free guidance to
improve adherence to text. Comparing to the evaluated baselines, AudioGen
outperforms over both objective and subjective metrics. Finally, we explore the
ability of the proposed method to generate audio continuation conditionally and
unconditionally. Samples: https://tinyurl.com/audiogen-text2audi
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
We present CM3Leon (pronounced "Chameleon"), a retrieval-augmented,
token-based, decoder-only multi-modal language model capable of generating and
infilling both text and images. CM3Leon uses the CM3 multi-modal architecture
but additionally shows the extreme benefits of scaling up and tuning on more
diverse instruction-style data. It is the first multi-modal model trained with
a recipe adapted from text-only language models, including a large-scale
retrieval-augmented pre-training stage and a second multi-task supervised
fine-tuning (SFT) stage. It is also a general-purpose model that can do both
text-to-image and image-to-text generation, allowing us to introduce
self-contained contrastive decoding methods that produce high-quality outputs.
Extensive experiments demonstrate that this recipe is highly effective for
multi-modal models. CM3Leon achieves state-of-the-art performance in
text-to-image generation with 5x less training compute than comparable methods
(zero-shot MS-COCO FID of 4.88). After SFT, CM3Leon can also demonstrate
unprecedented levels of controllability in tasks ranging from language-guided
image editing to image-controlled generation and segmentation
Solution structure and electrostatic properties of an SH2 domain/phosphopeptide complex
grantor:
University of TorontoSH2 domains are small (~100 amino acid) protein recognition domains found in numerous proteins involved in signal transduction which bind to sites of tyrosine phosphorylation with high affinity in a sequence-dependent manner. We have focused on the SH2 domains of phospholipase C-ã (PLC-ã), which provide a link between activated growth factor receptors via binding through its two SH2 domains and the production of the second messengers IP3 and DAG. The interaction of PLC-ã with the platelet derived growth factor receptor (PDGFR) is at sequences about Tyr 1021 of the PDGFR, and disruption of this interaction results in decreased cell growth following growth factor stimulation. Binding studies using degenerate phosphopeptide libraries suggest that this interaction involves the C-terminal (PLCC), and not N-terminal (PLCN) SH2 domain of PLC-ã. Thus we have studied this interaction involving the PLCC SH2 domain and a 12 amino acid phosphopeptide representing sequences about Tyr 1021 using heteronuclear NMR techniques. I was involved in the cloning and purification of this SH2 domain and preparation of NMR samples of this protein/peptide complex. A full structural determination was performed on this complex in collaboration with Dr. Steve Pascal. During structure determination, I defined the conformation of the phosphopeptide in this complex, as well as demonstrating protein-peptide contacts. Protein/peptide NOEs involving pTyr resonances defined a large positively-charged pocket containing four arginine residues which bound this residue. NOEs could not define contacts with the pTyr phosphate group, and we used large downfield chemical shift changes of guanidinium group resonances to do so. pH titration studies demonstrated that the pTyr phosphate group is bound in the -2 charge state with several residues held in place to facilitate pTyr binding by a complex hydrogen bonding network. A large hydrophobic cavity on the SH2 domain surface bound six residues C-terminal to pTyr, and in particular, the Ile +1 and Pro +3 residues were deeply buried. Thus a combination of NMR techniques involving NMR assignment, structure determination and pH titration studies provided significant insights into the specific binding of SH2 domains.Ph.D
EqGNN: Equalized Node Opportunity in Graphs
Graph neural networks (GNNs), has been widely used for supervised learning tasks in graphs reaching state-of-the-art results. However, little work was dedicated to creating unbiased GNNs, i.e., where the classification is uncorrelated with sensitive attributes, such as race or gender. Some ignore the sensitive attributes or optimize for the criteria of statistical parity for fairness. However, it has been shown that neither approaches ensure fairness, but rather cripple the utility of the prediction task.
In this work, we present a GNN framework that allows optimizing representations for the notion of Equalized Odds fairness criteria. The architecture is composed of three components: (1) a GNN classifier predicting the utility class, (2) a sampler learning the distribution of the sensitive attributes of the nodes given their labels. It generates samples fed into a (3) discriminator that discriminates between true and sampled sensitive attributes using a novel ``permutation loss'' function. Using these components, we train a model to neglect information regarding the sensitive attribute only with respect to its label. To the best of our knowledge, we are the first to optimize GNNs for the equalized odds criteria. We evaluate our classifier over several graph datasets and sensitive attributes and show our algorithm reaches state-of-the-art results
Is Oprah Contagious? The Depth of Diffusion of Demand Shocks in a Product Network
Recent studies have documented that the contagion of information and behaviors in social networks is generally quite limited. We examine whether this pattern characterizes exogenous demand shocks diffusing in a product network. To this end, we analyze a unique series of demand shocks induced by mass-media book reviews on the Oprah Winfrey television show and in The New York Times. Our identification strategy is based on a difference-in-differences model estimated using two different groups as control, based on propensity-score-based matching and network proximity to a reviewed book, respectively. Our results show that the diffusion of exogenous demand shocks in the Amazon.com product network is relatively shallow, typically about three edges deep into the network, although the economic impact of this diffusion can often be significant. We link our results to recent findings in the context of diffusion in social networks and discuss managerial implications
KNN-Diffusion: Image Generation via Large-Scale Retrieval
While the availability of massive Text-Image datasets is shown to be
extremely useful in training large-scale generative models (e.g. DDPMs,
Transformers), their output typically depends on the quality of both the input
text, as well as the training dataset. In this work, we show how large-scale
retrieval methods, in particular efficient K-Nearest-Neighbors (KNN) search,
can be used in order to train a model to adapt to new samples. Learning to
adapt enables several new capabilities. Sifting through billions of records at
inference time is extremely efficient and can alleviate the need to train or
memorize an adequately large generative model. Additionally, fine-tuning
trained models to new samples can be achieved by simply adding them to the
table. Rare concepts, even without any presence in the training set, can be
then leveraged during test time without any modification to the generative
model. Our diffusion-based model trains on images only, by leveraging a joint
Text-Image multi-modal metric. Compared to baseline methods, our generations
achieve state of the art results both in human evaluations as well as with
perceptual scores when tested on a public multimodal dataset of natural images,
as well as on a collected dataset of 400 million Stickers